Naïve Bayesian Based on Chi Square to Categorize Arabic Data
نویسندگان
چکیده
Text classification is a supervised technique that uses labelled training data to learn the classification system and then automatically classifies the remaining text using the learned system. This paper investigates Naïve Bayesian algorithm based on Chi Square features selection method. The base of our comparisons are macro F1, macro recall and macro precision evaluation measures. The experimental results compared against different Arabic text categorization data sets provided evidence that feature selection often increases classification accuracy by removing rare terms.
منابع مشابه
Bayesian assessment of goodness-of- fit against nonparametric alternatives
The classical chi-square test of goodness-of-fit compares the hypothesis that data arise from some parametric family of distributions, against the nonparametric alternative that they arise from some other distribution. However, the chi-square test requires continuous data to be grouped into arbitrary categories. Furthermore, as the test is based upon an approximation, it can only be used if the...
متن کاملEffective Discretization and Hybrid feature selection using Naïve Bayesian classifier for Medical datamining
As a probability-based statistical classification method, the Naïve Bayesian classifier has gained wide popularity despite its assumption that attributes are conditionally mutually independent given the class label. Improving the predictive accuracy and achieving dimensionality reduction for statistical classifiers has been an active research area in datamining. Our experimental results suggest...
متن کاملChi Square Feature Extraction Based Svms Arabic Language Text Categorization System
This paper aims to implement a Support Vector Machines (SVMs) based text classification system for Arabic language articles. This classifier uses CHI square method as a feature selection method in the pre-processing step of the Text Classification system design procedure. Comparing to other classification methods, our system shows a high classification effectiveness for Arabic data set in term ...
متن کاملWord sense disambiguation for arabic text categorization
In this paper, we present two contributions for Arabic Word Sense Disambiguation. In the first one, we propose to use both two external resources AWN and WN based on Term to Term Machine Translation System (MTS). The second contribution relates to the disambiguation strategies, it consists of choosing the nearest concept for the ambiguous terms, based on more relationships with different concep...
متن کاملA Socio-Cultural Study of Language Teacher Status
The present study pursued two goals: First, to discover the subscales underlying the teacher Status Scale (TSS); and second, to reveal the status of the teachers of Persian, Arabic, and English in Iranian junior high school students’ perceptions in order to determine the relative roles of national, religious, and western influences in the identity construction of the students. The data was coll...
متن کامل